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Abstract 

We discuss the problem of defining an estimate for the error in Quasi-Monte 
Carlo integration. The key issue is the definition of an ensemble of quasi- 
random point sets that, on the one hand, includes a sufficiency of equivalent 
point sets, and on the other hand uses information on the degree of uniformity 
of the point set actually used, in the form of a discrepancy or diaphony. A 
few examples of such discrepancies are given. We derive the distribution of 
our error estimate in the limit of large number of points. In many cases, 
Gaussian central limits are obtained. We also present numerical results for 
the quadratic star-discrepancy for a number of quasi-random sequences. 



1 The error problem 

We discuss the problem of integration of a square integrable function over 
the s-dimensional unit hypercube K = [0, using a set of N points Xk, 
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k = 1,2, . . . , N. The actual integral is J = J K dxf(x), and its numerical 
estimate is given by 

s = ^E /(**) • (i) 

k=l 

Depending on the way in which the points OCfc £1X6 chosen, we distinguish 
different integration methods: if the points come from some predetermined, 
deterministic scheme we have a quadrature rule, if they are considered to be 
iid uniform random numbers, we have Monte Carlo. An intermediate case is 
that of Quasi-Monte Carlo, where the points are considered to be part of a 
low-discrepancy sequence, but share the ergodic properties of a truly random 
sequence^. The integration error is defined as r\ = S — J. Good integration 
methods are characterized by the fact that they typically lead to a small 
value of 7], but, more importantly, allow one to obtain a good estimate of 
rj. In the case of Monte Carlo, rj is a stochastic variable, and hence has a 
probability density P{rj). For quasi-random point sets used in Quasi-Monte 
Carlo, we may (as we shall specify more precisely later on) also interpret rj as 
having such a probability density: its form is the subject of this contribution. 

In true Monte Carlo, the error distribution P(rj) is obtained by viewing 
the point set {x\, x%, . . . , xn} as a typical member of an ensemble of random 
point sets, governed by the obvious Cartesian combined probability density 

P N (xi,x 2 ,...,x N ) = 1 , (2) 

so that the x^ are indeed iid uniform random numbers. The error 77 is then 
a random variable over this ensemble, with the following well-known results. 
In the first place, S is an unbiased estimator of J in the sense that (if) = 0, 
where the average is over the ensemble of points sets. In the second place, 
for large N, P{N) approaches a normal distribution according to the Central 
Limit Theorem. Finally, the variance of this distribution is given by (rf) = 
Var(/)/iV, where Var denotes the variance. Note that, since we average 
over the integration points, the error distribution can depend only on the 
integrand itself. 

The conceptual problem in the use of quasi-random rather than truly 
random point sets is the following: a quasi-random point set is not a 'typical' 
set of random points! Indeed, quasi-random point sets are very special, with 
carefully built-in correlations between the various points so that each new 
point tends to 'fill a gap' left by the previous ones. The usual Monte Car- 
lo error estimate is therefore not really justified in Quasi-Monte Carlo. On 
the other hand, many different error bounds assure us that small errors are 
possible, and indeed likely when we apply low-discrepancy point sets. In the 
following, we shall discuss two approaches to a solution of this conundrum. 
Obviously, we can only summarize the main results here: technical details 
and pictures can be found in the references. 

5 We shall not discuss the case of point sets with fixed, predetermined N. 



2 The Bayesian approach 



The first way around the aforementioned conceptual problem is to inter- 
change the roles of integrand and point set: we view the integrand f(x) as 
a typical member of some underlying class of functions and average over 
this class, so that the error depends only on a property of the point set. In 
practice, the choice of function class often entails a good deal of idealism or 
pot luck, as usual in a Bayesian approach to probability. We discuss several 
examples, in which we denote by ()^ an average over the probability measure 
governing the function class. 

2.1 The Wozniakowski Lemma 

Let the integrand f(x) be chosen according to the Wiener (sheet) measure 
in s dimensions. This measure is Gaussian, with 

(/(*)), = , (f(x),f{y)) f = nmin(x^) , (3) 

where the index \x labels the coordinates. We may then quote the Lemma 
from 0: 

(v) f = Q , (v 2 ) f = D 2 (x* lJ x* 2} ...,x* M ) , (4) 

where D 2 stands for the L 2 norm of the well-known star-discrepancy, and 
the x* k denotes the 'reflected' point, with [x%Y = 1 — x M . In p| it is shown, 
moreover, that the distribution P(rj) in this case is a Gaussian. 

We have here the interesting general fact that the choice of a particular 
function class induces its own particular discrepancy. On the other hand, in 
many cases (such as in particle physics phenomenology) the Wiener measure 
is certainly not appropriate since it is dominated by integrands that are 
not locally smooth. In ||, folded Wiener sheet measures are studied with 
analogous results, but then again these describe functions that are much too 
smooth. 

2.2 Induced discrepancies 

In 0, we established the following general result. Let the measure on the 
function class be such that 

(f( x i))f = Q > (f(x 1 )f(x 2 )) f = J dy h{x 1 ,y)h{x 2 ,y) (5) 

for all Xi 5 2 in K, for some h(x,y). There is then an induced quadratic dis- 
crepancy, defined as follows: 

f 1 N r 

(v 2 ) f = J dy g{yf , g(y) = — J2 h ( x k,y) - J dx h(x,y) . (6) 



Note that h is not necessarily in the same function class as the /, and indeed 
y may be defined in a space quite different from K. Note that, whenever 
the function class measure is Gaussian, then P{rj) will also be Gaussian. 
Generalizations to higher moments can be found in 0, f|] . 

2.3 Orthonormal function bases 

As an example in s = 1, let u n (x) be an orthonormal set of functions on K, 
as follows: 

Uo(x) = 1 , J dx u m (x)u n (x) = 5 mjn , (7) 

K 

with m, n — 0, 1, 2, — Let f(x) admit of a decomposition 

f( x ) = E^M^) > ( 8 ) 

n>0 

and choose the measure such that the v n are normally distributed around 
zero, with variance o\. The induced quadratic discrepancy D%" th is then 
definedf] as 

1 1 N 

( 7 l 2 ) f =N D °i th > D i th = ^E^E u n {x k )u n {x{) . (9) 

1 iV iV n>0 k,l=l 

A special case is that of the Fourier class, which has 

u 2n = V / 2cos(27rnx) , v,2 n ~i = v^2sin(27rnx) , n = 1, 2, 3, . . . (10) 

The physically reasonable requirement that the phase of each mode n (made 
up from «2n-i an d u 2n ) is uniformly distributed forces us to have the Gaus- 
sian measure, with in addition a 2n _i = a 2n , which property corresponds to 
translational invariance. We then have 

2 

(11) 

Obviously, other orthonormal function bases are also possible, such as the 
system of Walsh functions; a further discussion, including the straightforward 
generalization to higher dimension, can be found in §|, ^, 0| • Note that all 
such quadratic discrepancies are nonnegative by construction. 

3 The discrepancy-based approach 

Another way of establishing integration error estimates, which in our opinion 
does more justice to the spirit of Monte Carlo, is the following. Instead 
of considering all point sets of iV truly random points, with the Cartesian 
probability density (Q), we restrict ourselves to those point sets that have a 
given value of discrepancy, for some predefined type of discrepancy. In this 
way, information on the discrepancy performance of one's favorite quasi-ran- 
dom number sequence can be incorporated in the error estimate. 
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3.1 Non-Cartesian distribution of points 



We have then, instead of (0), a combined probability density P^ for the N 
points as follows. Let D N (x±, . . . ,xn) be some discrepancy defined on sets 
of N points in K, and suppose its value for the actual point set that is used 
in the integration be w. Then, 



P N (w;xi, 



H(w) 



x N 



J dxi ■ ■ ■ dxN 5 (Dn(xi, . . . ,xn) — w) 
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1 

5 (D N (x 1 , . . .,x N ) - w ) 



H(w) 
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F N (w;x 1 ,...,x N ) 



(12) 



so that Fjv measures the deviation from Cartesian (iid) uniformity. The 
quantity H{w) is of course just the probability density for the discrepancy 
over the iid uniform random numbers, an object interesting in its own right. 
Let us also define marginal deviations as 



F k (w;x h . . . ,x k ) = J dx k+ i---dx N F N (w;x h ...,x N ) . 



(13) 
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We can then simply establish, for instance, that, provided Fi(w;x) vanishes 
for all x, 
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dxdy f(x)f(y)F 2 (w;x,y) 
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(14) 



where () now denotes averaging with respect to Pn(w; .). It is seen that we 
may expect a reduced error if F 2 is positive when x and y are 'close' together 
in some sense, i.e. if the points in the point set 'repel' each other. Note that 
only a small, 0(1/ N), deviation from uniformity is sufficient. 



3.2 Error probability distribution 



In many cases, it is actually possible to compute the F 2 mentioned above. 
In fact, especially in the case of discrepancies defined using orthonormal 
function bases, we can do much more. Using a Feynman-diagram technique 
described in detail in || ||, we can establish results for P[rj) as an asymptotic 
expansion in 1/N. To leading order, we have 
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1 



Z n>0 



(15) 



where the z integral runs to the left of any singularities. This result holds, for 
N asymptotically large, for any discrepancy measure based on orthonormal 
functions as discussed above, and, moreover, for any reasonable /, even if 
it is not in the function class based on these orthonormal functions. The 
1/N corrections are fully calculable, although we have not done so yet. Two 
corollaries follow immediately. In the first place, 

oo 

/ dw H(w)P(ri) = ^N/2TxVe^ 2N/2V , V = £ v* = Var(/) , (16) 

which recovers the Central Limit Theorem valid over the whole ensemble of 
iV-point point sets with any w. In the second place, we obtain an integral 
representation for H(w) by insisiting that P{rj) be normalized to unity: 



H(w) = / dz exp 

V ; 2m J P 



9 

Z n>0 



(17) 



Generalizations of these results only affect the sums over n. 



3.3 Application 1: equal strengths 

A simple model for a discrepancy is obtained by taking a n = 1/2M for n = 
1,2, ... , 2M, and zero otherwise (with trivial extension to more dimensions). 
Let us then decompose the variance of the integrand as follows: 

2M 

V = V*r(f) = '£vl = V 1 + V 2 , fi = J>n , V 2 = £ v 2 n , (18) 

n>0 n=l n>2M 

so that V\ contains that part of the variance to which the discrepancy is 
sensitive (the 'covered' part) and V 2 the rest (the 'uncovered' part). We then 
have 

H(w) = 
P(rj) ~ 

where the approximations are valid for large M. We see that a new central 
limit theorem holds, where the variance of / has been modified so that its 
covered part is reduced by a factor w, according to intuition. 
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2(wV 1 + v 2 ) 



(19) 



3.4 Application 2: harmonic model in one dimension 

Let us concentrate on the case s — 1, and take a 2n -\ = a 2n = 1/n, so that 
/ is, on the average, square integrable, but its derivative is not. In that case 
we have, 

H(w) = V f-ir- 1 ™^"™ 2 / 2 . (20) 



which is, apart from a trivial scaling, precisely the probability density of 
the Kolmogorov-Smirnov statistic. This is somewhat surprising since that 
statistic is based on the norm of the standard star-discrepancy, a to- 
tally different object. In addition, we conjecture, that for values of w small 
compared to its expectation value (w) = 7r 2 /3, we shall have 

C = Var(/)^ ; (21) 

this would again indicate a reduction of the error estimate for small w. To 
date, we have not yet been able to prove this assertion. 



P{rf) oc exp 



7] 2 N 
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3.5 The quadratic star-discrepancy 

We have also obtained some results for the quadratic form of the standard 
star-discrepanc [|], |5|. Although this is not based on orthonormal functions 
and the analysis is hence more complicated, we have obtained the moment- 
generating function G(z) = (e zw ) for asymptotically large N, where w now 
stands for N times the quadratic star-discrepancy. More precisely, we have 

G{z) = exp(V^))//^ , 

= -~£Q s (2n-i)io g (i-2z n ) , 

1 n>0 

X(z) = ^-£Q.(2n-l)-^- , 

Zn = (4/tt 2 ) % 2Z - . (22) 
W ) (2n-l) 2 V ' 

Here, Q s (m) is the number of ways in which an odd positive integer m can 
be written as a product of s positive integers, including l's. The function 
H(w) can now be computed numerically for different s values. We have done 
so, and find that H(w) very slowly approaches a Gaussian distribution as s 
increases. Indeed, the skewness of H(w) is, for large s, approximately given 
by (216/225) s / 2 so that the approach to normality is indeed slow. 



4 Numerical results for the quadratic star- 
discrepancy 

Since we now have H(w) for the quadratic star-discrepancy, we can reliably 
judge how well quasi-random number generators perform, since we can com- 
pare the discrepancy of their output with the behaviour of truly random 
points. Space does not permit us to show pictures, which can be found in 
H, H]. Here, we just describe the results. We have computed the quadratic 
star-discrepancy for a good pseudorandom generator (RANLUX), and for the 



exactly, and by Monte Carlo. The latter method is actually faster for iV 
larger than about 50,000 if we ask for 5% accuracy. We made runs of up to 
N = 150, 000, and considered the lowest and highest discrepancy value in 
subsequent intervals of 1000. These we compared with the expected value 
(2~ s — 3~ s )/iV for truly random points, and also plotted the standardized)] 

form £(u>) — (w— (w))/ yVar(w;). We considered dimensions from 1 up to 20. 
In all dimensions, RANLUX appears to mimic a truly random sequence quite 
well. The quasi-random generators perform very well in low dimensions, and 
generally the discrepancy falls further and further below than of a random 
sequence as N increases. There are exceptions, however: for instance, the 
SoboF sequence for s = 11 degrades, and is actually worse than random 
for iV ~ 60, 000, again rapidly improving for larger N. Apart from taking 
this as a warning, we have not investigated the reason for such behaviour 
in detail. The biggest surprise was when we plotted the variable £(w) (for 
instance, for iV = 150, 000) as a function of s. It appears that, as measured 
in this way, the performance of all quasi-random generators improves with 
increasing s! For s larger than 15 or so, all generators become rather bad, 
which is of course due to the fact that the onset of the asymptotic regime 
occurs for larger N as s increases. But what is more striking is the fact that 
the old-fashioned and simple Richtmyer generator performs as well as the 
modern, sophisticated SoboF and Niederreiter sequences. We take this as an 
indication that the Richtmyer generator deserves more study, in particular 
since we have not attempted any optimization of its 'lattice point'. 
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